The Inclusion Problem for Regular Expressions
نویسنده
چکیده
This paper presents a polynomial-time algorithm for the inclusion problem for a large class of regular expressions. The algorithm is not based on construction of finite automata, and can therefore be faster than the lower bound implied by the Myhill-Nerode theorem. The algorithm automatically discards irrelevant parts of the right-hand expression. The irrelevant parts of the right-hand expression might even be 1-ambiguous. For example, if r is a regular expression such that any DFA recognizing r is very large, the algorithm can still, in time independent of r, decide that the language of ab is included in that of (a + r)b. The algorithm is based on a syntax-directed inference system. It takes arbitrary regular expressions as input. If the 1-ambiguity of the right-hand expression becomes a problem, the algorithm will report this. Otherwise, it will decide the inclusion problem for the input.
منابع مشابه
Complexity of Decision Problems for Simple Regular Expressions
We study the complexity of the inclusion, equivalence, and intersection problem for simple regular expressions arising in practical XML schemas. These basically consist of the concatenation of factors where each factor is a disjunction of strings possibly extended with ‘∗’ or ‘?’. We obtain lower and upper bounds for various fragments of simple regular expressions. Although we show that inclusi...
متن کاملComplexity of Decision Problems for XML Schemas and Chain Regular Expressions
We study the complexity of the inclusion, equivalence, and intersection problem for XML schemas occurring in practice. These schemas make use of regular expressions with a very simple structure: they basically consist of the concatenation of factors, where each factor is a disjunction of strings, possibly extended with “∗”, “+”, or “?”. We refer to these as CHAin Regular Expressions (CHAREs). W...
متن کاملInclusion of Unambiguous RE#s is NP-Hard
We show that testing inclusion between languages represented by regular expressions with numerical occurrence indicators (#REs) is NP-hard, even if the expressions satisfy the requirement of “unambiguity”, which is required for XML Schema content model expressions. 1 Proof of the result We have seen before [3] that testing for inclusion and overlap of languages represented by #REs is NP-hard. T...
متن کاملOptimizing Schema Languages for XML: Numerical Constraints and Interleaving
The presence of a schema offers many advantages in processing, translating, querying, and storage of XML data. Basic decision problems like equivalence, inclusion, and non-emptiness of intersection of schemas form the basic building blocks for schema optimization and integration, and algorithms for static analysis of transformations. It is thereby paramount to establish the exact complexity of ...
متن کاملRegular Expressions with Numerical Occurrence Indicators - preliminary results
Regular expressions with numerical occurrence indicators (#REs) are used in established text manipulation tools like Perl and Unix egrep, and in the recent W3C XML Schema Definition Language. Numerical occurrence indicators do not increase the expressive power of regular expressions, but they do increase the succinctness of expressions by an exponential factor. Therefore methods based on straig...
متن کامل